Skip to content

fix(network-agent): allow dots in sandboxID validation#163

Open
rogeroger-yu wants to merge 1 commit into
TencentCloud:masterfrom
rogeroger-yu:fix-net-name-verify
Open

fix(network-agent): allow dots in sandboxID validation#163
rogeroger-yu wants to merge 1 commit into
TencentCloud:masterfrom
rogeroger-yu:fix-net-name-verify

Conversation

@rogeroger-yu
Copy link
Copy Markdown

Summary

  • Fix overly strict sandboxID validation in network-agent's state_store that
    rejected any ID containing . (dot). IDs like cubesandbox-python-slim-3.11-nydus_0
    (containing version numbers) were incorrectly flagged as path traversal.
    Changed to only reject /, \, and .. — consistent with Cubelet's
    pathutil.ValidateID().
  • Fix ansible control role crash when /etc/NetworkManager/dnsmasq.d does not
    exist (hosts using systemd-resolved instead of NM-dnsmasq).
  • Skip nydus-image tarball download in nydus-uffd role when the binary is
    already pre-staged locally.

Root Cause

state_store.go:path() used strings.ContainsAny(sandboxID, "/\\.") which
treats a single . the same as path separators. A dot in a version string
(e.g. 3.11) is perfectly safe — only .. is a traversal risk.

Test Plan

  • Template creation with dotted image tag succeeds end-to-end
  • Full multi-node ansible deploy passes (0 failures across 5 nodes)
  • All compute nodes report HEALTHY after network-agent restart

Original Error

<ie-training/cubesandbox-python-slim:3.11 --writable-layer-size 1G --storage-media nydus --expose-port 49983 --probe 49983 --probe-path /health --template-id cubesandbox-python-slim-3.11-nydus
job_id: d94ebd92-785e-4cfa-ba7d-b19e9df80aa8
template_id: cubesandbox-python-slim-3.11-nydus
attempt_no: 1
operation: CREATE
artifact_id:
status: PENDING
phase: PULLING
progress: 0%
distribution: 0/0 ready, 0 failed
root@ak-ai-engine-sandbox05:~/Workspace/code/cube-sandbox# /usr/local/services/cubetoolbox/CubeMaster/bin/cubemastercli tpl watch --job-id d94ebd92-785e-4cfa-ba7d-b19e9df80aa8
job_id: d94ebd92-785e-4cfa-ba7d-b19e9df80aa8
template_id: cubesandbox-python-slim-3.11-nydus
attempt_no: 1
operation: CREATE
artifact_id: rfs-85671e04f4cdfcd700f581a1
status: FAILED
phase: CREATING_TEMPLATE
progress: 100%
distribution: 1/1 ready, 0 failed
template_spec_fingerprint: 85671e04f4cdfcd700f581a19887ccdf7de95430f820b195973b99d2687f8ab3
template_status: FAILED
error: template cubesandbox-python-slim-3.11-nydus creation failed: network-agent EnsureNetwork failed: grpc_code=Unknown grpc_msg=invalid sandboxID "cubesandbox-python-slim-3.11-nydus_0": contains path separators or traversal characters
cubemastercli run fail: template cubesandbox-python-slim-3.11-nydus creation failed: network-agent EnsureNetwork failed: grpc_code=Unknown grpc_msg=invalid sandboxID "cubesandbox-python-slim-3.11-nydus_0": contains path separators or traversal characters

@rogeroger-yu rogeroger-yu requested a review from fslongjin as a code owner May 8, 2026 08:29
@fslongjin
Copy link
Copy Markdown
Member

/cubebot review

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

Claude finished @fslongjin's task in 1m 10s —— View job


PR Review: fix(network-agent): allow dots in sandboxID validation

  • Read and understand the code change
  • Review for correctness, security, and consistency
  • Post review feedback

Verdict: LGTM — the fix is correct and well-aligned with existing patterns

The change (state_store.go:95):

// Before:
if strings.ContainsAny(sandboxID, `/\.`) || sandboxID == "" {

// After:
if sandboxID == "" || strings.ContainsAny(sandboxID, `/\`) || strings.Contains(sandboxID, "..") {

Correctness

The fix correctly distinguishes between a single . (safe, used in version strings like 3.11) and .. (path traversal). The original ContainsAny(sandboxID, /.) treated any dot as dangerous, which broke IDs like cubesandbox-python-slim-3.11-nydus_0.

Consistency with codebase

This now matches exactly the pattern used in Cubelet/pkg/pathutil/validate.go:17:

if strings.ContainsAny(id, `/\`) || strings.Contains(id, "..") {

It also aligns with Cubelet/pkg/utils/pathsec.go:21-22 which rejects ., .., and .. substrings:

if strings.ContainsAny(untrusted, `/\`) || untrusted == "." || untrusted == ".." ||
    strings.Contains(untrusted, "..") {

One subtle difference: pathsec.SafeJoinPath additionally rejects the bare . component (untrusted == "."). The state_store.path() function doesn't reject a bare . sandboxID. This is also consistent with pathutil.ValidateSafeID() which doesn't reject bare . either. Since the path() function appends .json, a bare . would produce <dir>/.json — a hidden file, not a traversal. This is a pre-existing behavior and not a regression introduced by this PR.

Security

Path traversal protection remains intact:

  • / and \ are still rejected (path separators)
  • .. is still rejected (traversal sequence)
  • Single . in version strings like 3.11 is correctly allowed

Suggestion (non-blocking)

The existing state_store_test.go only tests the happy path. Adding a few validation test cases would prevent regressions and document the expected behavior:

func TestStateStorePathValidation(t *testing.T) {
    store, _ := newStateStore(t.TempDir())
    tests := []struct {
        id      string
        wantErr bool
    }{
        {"valid-id-123", false},
        {"cubesandbox-python-slim-3.11-nydus_0", false},  // single dots ok
        {"", true},                                        // empty
        {"foo/bar", true},                                 // forward slash
        {"foo\\bar", true},                                // backslash
        {"foo..bar", true},                                // double dot
        {"../etc/passwd", true},                           // traversal prefix
    }
    for _, tt := range tests {
        _, err := store.path(tt.id)
        if (err != nil) != tt.wantErr {
            t.Errorf("path(%q) err=%v, wantErr=%v", tt.id, err, tt.wantErr)
        }
    }
}

This mirrors the test coverage that already exists for pathutil.ValidateSafeID() in Cubelet/pkg/pathutil/validate_test.go.

Fix this →


@fslongjin
Copy link
Copy Markdown
Member

@rogeroger-yu Hi, could you please add some unit tests to stabilize this behavior?

@rogeroger-yu
Copy link
Copy Markdown
Author

@fslongjin okey.

@fslongjin
Copy link
Copy Markdown
Member

After reviewing this, I don't think this is the right way to fix it. The sandbox ID should be designed as a UUID. The root cause here is that when CubeMaster generates the template, it sends the template OCI image name​ to Cubelet as the sandbox ID.
A possible solution would be to modify CubeMaster to either generate a UUID or pass down the template ID​ to the Cubelet layer.
Note:​ For normal sandbox creation requests, the sandbox ID is generated by Cubelet. However, the current step is creating a sandbox template​ (for app snapshots), which is why CubeMaster specifies the sandbox ID.

@rogeroger-yu rogeroger-yu force-pushed the fix-net-name-verify branch from e15e0a4 to 16fd291 Compare May 9, 2026 06:38
@rogeroger-yu
Copy link
Copy Markdown
Author

@fslongjin I agree with your judgment and resubmit a version. I will review it after my hard work.


templateID, ok := opts.GetSnapshotTemplateID()
if !ok {
if _, ok := opts.GetSnapshotTemplateID(); !ok {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if statement should be dropped.

@rogeroger-yu rogeroger-yu force-pushed the fix-net-name-verify branch from 16fd291 to 5a7bee8 Compare May 9, 2026 08:44
Comment thread network-agent/internal/service/state_store_test.go
Cubelet: generate UUID-based sandboxID for snapshot creation
@rogeroger-yu rogeroger-yu force-pushed the fix-net-name-verify branch from 5a7bee8 to 4f8cedb Compare May 9, 2026 09:47
@rogeroger-yu rogeroger-yu requested a review from chenhengqi May 11, 2026 02:48
@chenhengqi
Copy link
Copy Markdown
Collaborator

cc @fslongjin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants